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STREPTOCOCCUS PNEUMONIAE DNA SEQUENCES 

5 This invention provides DNA sequences from the 

Streptococcus pneumoniae genome, and methods of use of DNA 
fragments originating therefrom in a variety of biological 
and pharmaceutical applications. 

The recent emergence of widespread antibiotic 

10 resistance in common pathogenic bacterial species has 

justifiably alarmed the medical and research communities. 
Frequently these organisms are co-resistant to several 
different antibacterial agents. Particularly problematic has 
been the emergence and rapid spread of penicillin resistance 

15 in Streptococcus pneumoniae, which frequently causes upper 
respiratory tract infections. Resistance to penicillin in 
this organism can be due to modifications of one or more of 
the penicillin-binding proteins (PBPs) . Combating the 
phenomenon of increasing resistance to antibiotic agents 

20 among pathogenic organisms such as Streptococcus pneumoniae 
will require intensified research into the fundamental 
molecular biology of such organisms. Greater knowledge about 
the molecular biology of pathogenic organisms will lead to 
new antibacterial agents having novel and effective, actions . 

25 While inroads in the development of new antibiotics and 

new targets for antibiotic compounds have been made with a 
variety of microorganisms, progress has been less apparent 
in Streptococcus pneumoniae. In part, Streptococcus 
pneumoniae presents a special case because this organism is 

30 highly recombinogenic and readily takes up exogenous DNA 

from its surroundings. Thus, the need for new antibacterial 
compounds and new targets for antibacterial therapy in 
Streptococcus pneumoniae is more acute than in other 
organisms. 



< 
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The present invention relates to the genome of S. 
pneumoniae. The genomic information disclosed by the present 
invention enables: (1) preparation of molecular 
5 hybridization probes for use in PCR amplification of genes 
and regulatory regions, physical mapping, sequencing, 
mutagenesis, and mutation analysis, (2) homology comparisons 
with the genomes and open reading frames (ORFs) of other 
organisms, (3) creation of specifically mutated strains of 

10 S. pneumoniae wherein the mutation is targeted to any site 
or sites in the DNA sequence disclosed herein, (4) 
identification of S. pneumoniae promoters and other gene 
regulatory sequences, (5) identification of proteins/ORFs 
encoded by S. pneumoniae, (6) identification of virulence 

15 genes in S. pneumoniae, (7) determination of the biological 
function of proteins/ORFs and RNAs encoded by S. pneumoniae, 
(8) production of kits useful for determining gene function 
in the cell, and kits for isolating and analyzing genes that 
are mutated in antibiotic resistant clinical isolates of 5. 

20 pneumoniae, (9) production of proteins and RNAs encoded by 
S. pneumoniae, (10) production of antibodies against 
proteins and other antigens encoded by S. pneumoniae, (11) 
methods to identify compounds that bind to proteins and RNAs 
encoded by S. pneumoniae as potential new antibiotic 

25 compounds. 

In another embodiment the invention relates to 
substantially purified proteins encoded by the S. pneumoniae 
genome . 

30 Table 1 summarizes the proteins and nucleic acids 

disclosed herein, contigs, SEQ ID NO' s and predicted 
functions. 
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"Genome" refers to the full complement of chromosomal 
and extra-chromosomal DNA within a cell. The genome 
comprises the genetic blueprint for all proteins and RNAs 
encoded by the cell or organism. 
5 "ORF" (i.e. "open reading frame") designates a region 

of genomic DNA beginning with a Met or other initiation 
codon and terminating with a translation stop codon, 
potentially encoding a protein product. "Partial ORF" means 
a portion of an ORF as disclosed herein such that the 

10 initiation codon, the stop codon, or both are not disclosed. 

"DNA chip" or "Bio Chip" or "Bio DNA chip" refers to a 
solid matrix or support onto which is applied an array of 
oligonucleotides, or nucleotide sequences, or gene 
fragments, or genomic fragments, of S. pneumoniae which may 

15 further comprise a layer of S. pneumoniae cells suspended 
thereover in a semisolid medium such as agar or agarose. 

i 

"Consensus sequence" refers to an amino acid or 
nucleotide sequence that may suggest the biological function 
of a protein, DNA, or RNA molecule. Consensus sequences are 
20 identified by comparing proteins, RNAs , and gene homologs 
from different species. 

"Contiguous fragment building" or "Contiguous fragment" 
or u Contig" refers to the process and result, respectively, 
by which a fragment of DNA is assembled from smaller 
25 constituent DNA fragments by arranging the constituent 
pieces in their correct order and register such that the 
resulting contiguous fragment accurately depicts the native 
DNA sequence from which the smaller fragments originated. 
"Computer readable medium" includes, for example, a 
30 floppy disc, hard disc, random access memory, read only 
memory, and CD-ROM. 

The terms "cleavage" or "restriction" of DNA 
refers to the catalytic cleavage of the DNA with a 
restriction enzyme that acts only at certain sequences in 
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the DNA (viz, sequence-specific endonucleases) . The various 
restriction enzymes used herein are commercially available 
and their reaction conditions, cofactors, and other 
requirements are used in the manner well known to one of 
5 ordinary skill in the art. Appropriate buffers and substrate 
amounts for particular restriction enzymes are specified by 
the manufacturer or can be found in the literature. 

"Diagnostics" as used herein relates to in vitro 
or in vivo diagnosis for disease states or biological status 

10 in mammals, preferably humans. 

"Therapeutics" and "therapeutic/diagnostic 
combinations" means the treatment, or diagnosis and 
treatment, of disease states or biological status by in vivo 
administration to mammals, preferably humans, of 

15 compositions of the present invention, for example, 
antibodies . 

"Essential genes" or "essential ORFs" or 
"essential proteins" refer to genomic information or the 
protein (s) or RNAs encoded therefrom, which, when disrupted' 

20 by knockout mutation, or by other mutation, produce 
inviability in cells harboring said mutation. 

"Non-essential genes" or "non-essential ORFs" or 
"non-essential proteins" refer to genomic information or the 
protein (s) or RNAs encoded therefrom, which, when disrupted 

25 by knockout mutation, or other mutation, do not result in 
inviability of cells harboring said mutation. 

"Minimal gene set" refers to a genus of about 256 
genes that are conserved among different bacteria such as M. 
genitalium and H. influenzae. The minimal gene set appears 

3 0 to be necessary and sufficient to sustain life. See e.g. A. 
Mushegian and E. Koonin, "A minimal gene set for cellular 
life derived by comparison of complete bacterial genomes" 
Proc. Wat. Acad. Sci. 93, 10268 - 273 (1996). 
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The term "fragment thereof" denotes a fragment of 
a nucleic acid molecule described herein, wherein said 
fragment comprises a region of contiguity within said 
nucleic acid of at least 15 base pairs. The term may also 
5 refer to a peptide of at least 5 contiguous amino acid 
residues of a protein disclosed herein. 

The term "plasmid" refers to an extrachromosomal 
genetic element. The starting plasmids herein are either 
commercially available, publicly available on an 
10 unrestricted basis, or can be constructed from available 
plasmids in accordance with published procedures. In 
addition, equivalent plasmids to those described are known 
in the art and will be apparent to the ordinarily skilled 
artisan. 

15 "Recombinant DNA cloning vector" as used herein 

refers to any autonomously replicating agent, including, but 
not limited to, plasmids and phages, comprising a DNA 
molecule to which one or more additional DNA segments can or 
have been added. 

20 The term "recombinant DNA expression vector" as 

used herein refers to any recombinant DNA cloning vector, 
for example a plasmid or phage, in which a promoter and 
other regulatory elements are present to enable 
transcription of the inserted DNA. 

25 The term "vector" as used herein refers to a 

nucleic acid compound used for introducing exogenous DNA 
into host cells. A vector comprises a nucleotide sequence 
which may encode one or more protein molecules. Plasmids, 
cosmids, viruses, and bacteriophages, in the natural state 

30 or which have undergone recombinant engineering, are 
examples of commonly used vectors. 

The terms "complementary" or "complementarity" as 
used herein refers to the capacity of purine and pyrimidine 
nucleotides to associate through hydrogen bonding in double 



WO 98/26072 PCT/US97/22578 

-6- 

stranded nucleic acid molecules. The following base pairs 
are complementary: guanine and cytosine; adenine and 
thymine; and adenine and uracil, 

"Oligonucleotide" refers to a short polymeric 
5 nucleotide chain comprising from about 2 to 25 nucleotides. 

"Isolated nucleic acid compound" refers to any RNA 
or DNA sequence, however constructed or synthesized, which 
is locationally distinct from its natural location. 

A "primer" is a nucleic acid fragment which 
10 functions as an initiating substrate for enzymatic or 
synthetic elongation of a nucleic acid molecule. 

The term "promoter" refers to a DNA sequence which 
directs transcription of DNA to RNA. 

A "probe" as used herein is a labeled nucleic acid 
15 compound which can be used to hybridize with another nucleic 
acid compound. 

The term "hybridization" or "hybridize" as used 
herein refers to the process by which a single-stranded 
nucleic acid molecule joins with a complementary strand 
20 through nucleotide base pairing. 

"Recorded" as used herein refers to a process for 
storing information on a computer readable medium. 

"Substantially identical" means a sequence having 
sufficient homology to hybridize under high stringency 
25 conditions and/or at least 90% identity at the nucleotide or 
amino acid sequence level to a sequence disclosed herein. 

"Substantially purified" when used in reference to 
a protein or peptide means that the molecule has been 
largely, but not necessarily wholly, separated and purified 
3 0 from other cellular and non-cellular components. Typically a 
protein is substantially pure when it is at least about 60% 
by weight, free from other naturally occurring organic 
molecules. Preferably the purity is at least about 75%, more 
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preferably at least about 90%, and most preferably at least 
about 99% by weight pure. 

"Selective hybridization" refers to hybridization 
under conditions of high stringency. Hybridization of 
5 nucleic acid molecules depends upon factors such as the 
degree of complementarity, stringency of hybridization 
conditions, and the length of hybridizing strands. 

The term "stringency" relates to nucleic acid 
hybridization conditions. High stringency conditions 
10 disfavor non-homologous base pairing. Low stringency 
conditions have the opposite effect. Stringency may be 
altered, for example, by changes in temperature and salt 
concentration. Typical high stringency conditions comprise 

hybridizing at 50°C to 65°C in 5X SSPE and 50% formamide, 

15 and washing at 50°C to 65°C in 0.5X SSPE; typical low 

stringency conditions comprise hybridizing at 35°C to 37°C 

in 5X SSPE and 40% to 45% formamide and washing at 42°C in 
1X-2X SSPE. 

"SSPE" denotes a hybridization and wash solution 
20 comprising sodium chloride, sodium phosphate, and EDTA, at 
pH 7.4. A 20X solution of SSPE is made by dissolving 174 g 

of NaCl, 27.6 g of NaH 2 P04-H 2 0, and 7.4 g of EDTA in 800 ml 
of H2O. The pH is adjusted with NaOH and the volume brought 
to 1 liter. 

25 "SSC" denotes a hybridization and wash solution 

comprising sodium chloride and sodium citrate at pH 7 . A 20X 
solution of SSC is made by dissolving 175 g of NaCl and 88 g 
of sodium citrate in 800 ml of H2O. The volume is brought to 
1 liter after adjusting the pH with 10N NaOH. 

30 "Virulence gene" as used herein means a gene from 

a pathogenic organism such as S. pneumoniae that is required 
for infection and/or pathogenicity in vivo. Some virulence 
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genes are induced during infection of a host; others are 
expressed exclusively during in vivo infection. 



The Streptococcus pneumoniae genome contains about 2.2 
5 million nucleotide base pairs and comprises about 2000 to 
3000 ORFs and other genes. This invention provides, among 
other things, contiguous fragments, genes, and proteins from 
the S. pneumoniae genome (SEQ ID NO:l through SEQ ID 
NO:228) . 

10 Strain differences in S. pneumoniae may be associated 

with nucleotide sequence differences in one or more of the 
genomic fragments disclosed herein. Sequences that are 
substantially identical to the sequences disclosed herein 
are intended to be within the scope of the invention. 

15 The sequence fragments disclosed herein provide a wide 

variety of utilities. For example, the fragments may be used 
to identify regions of the S. pneumoniae genome that are 
expressed as proteins (viz. transcribed into mRNA) . The 
genomic fragments disclosed herein can also be used to 

20 examine differential expression of S. pneumoniae genes under 
diverse environmental conditions, as occurs, for example, 
with the expression of virulence genes during in vivo 
infection of a host organism. Also contemplated by the 
invention are: (1) preparation of molecular hybridization 

25 probes for use in physical mapping, sequencing, mutagenesis, 
mutation analysis, (2) homology comparisons of the sequences 
disclosed herein with the genomes and ORFs of other 
organisms, (3) creation of specifically mutated strains of 
S. pneumoniae wherein the mutation is targeted to any site 

30 in the DNA sequence disclosed herein, (4) identification of 
S. pneumoniae promoters and other gene regulatory sequences, 
(5) identification of proteins and RNAs encoded by S. 
pneumoniae, (6) amplification of S. pneumoniae genes using 
the PCR, and (7) production of kits for isolating and 
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analyzing genes that are mutated in antibiotic resistant 
clinical isolates of S. pneumoniae. 

Genome Analysis 

In one embodiment, the invention comprises the ORFs and 
fragments thereof encoded by the nucleotide sequences 
disclosed herein. Some of the nucleotide sequences disclosed 
herein encode ORFs and fragments of ORFs (Table 1) - The ORFs 
or fragments thereof were identified by translation of the 
nucleic acid sequences disclosed herein. The biological 
function of a protein disclosed in Table 1 was determined by 
homology comparison with known proteins from other 
organisms. A number of computer programs are available to 
assist in homology comparisons, for example Genemark 
(Borodovsky and Mclninch, Computers Chem. 17(2), 123, 1993). 

Computer-Related Applications 

The nucleotide and/or amino acid sequence information 
of this invention may be provided in a variety of media to 
facilitate use. In one embodiment the present invention 
comprises one or more of the sequences disclosed herein 
recorded on a computer readable medium. A variety of media 
are contemplated, for example, magnetic storage media such 
as floppy discs, hard disc storage, magnetic tape, and CD- 
ROM. A skilled artisan can readily adopt any presently known 
method for recording information on a computer readable 
medium to generate manufactures comprising the nucleotide or 
amino acid sequence information of the present invention. 
These embodiments are contemplated within the scope of this 
invention. 

The choice of a data storage structure will generally 
be based on the means chosen to access the stored 
information. A variety of data processor programs and 
formats can be used to store the sequence information of the 
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in vent ion on computer readable medium. For example, the 
sequence can be represented in a word processing text file 
that is formatted in commercially available software such as 
WordPerfect and Microsoft Word, or it can be represented in 
5 the form of a text only file such as ASCII. 

Having S. pneumoniae genomic sequence information in a 
computer readable format enables a skilled artisan to access 
the information for a variety of purposes. For example, 
computer-assisted searching algorithms may be used to 

10 identify open reading frames, and ascertain biological 
function based on homology to known proteins from other 
organisms. Suitable algorithms for sequence comparisons 
include BLAST (Altschul et al . , J. Mol. Biol. 215, 403-410, 
1990) and BLAZE (Brutlag et al . , Comp. Chem. 17, 203-207 

15 (1993) . For identification of ORFs a number of commercially 
available software programs are suitable, such as FRAMES 
(Genetic Center Group, Madison, WI) . 

The genomic information of this invention in computer- 
readable form can be manipulated further using 

20 bioinformatics to identify the biological function of 

proteins encoded by ORFs as well as the cellular location of 
said proteins. The skilled artisan will recognize several 
computer-assisted algorithms for this purpose, for example, 
PSORT which is useful for determining the likely location of 

25 a protein within a cell (See K. Nakai & M. Kanehisa. "Expert 
system for predicting protein localization sites in Gram- 
negative bacteria", Proteins: Structure, Function, and 
Genetics, 11, 95-110 (1991). 

30 

Open Reading Frames and Proteins 

The invention also provides proteins encoded by the S. 
pneumoniae genome in substantially purified form (See Table 
1) . The proteins are classified herein as (1) Hypothetical, 
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(2) Cell wall biosynthetic, (3) External target, or (4) 
Minimal gene set proteins. 

Cells that carry knockout mutations in proteins of the 
hypothetical class are nonviable. Loss of viability suggests 
5 that these proteins may be essential for viability. Two such 
proteins, whose genes map to contigs m014 and m016, 
correspond respectively to Haemophillus influenzae ORFs 
HI1146 and HI1648. Two other hypothetical proteins, yyaF and 
ywbL, correspond to a GTP binding protein and 

10 transcriptional regulator, respectively. 

The proteins of this invention can be used to raise 
antibodies. Antibodies against the hypothetical class of 
proteins are especially attractive. In targeting 
presumptively essential cellular functions, antibodies 

15 against "hypothetical proteins" could have therapeutic or 
prophylactic applications. Additionally, the "hypothetical" 
proteins can be used to screen for agents that bind or 
otherwise interact with said proteins. Such agents could 
lead to the identification of new antibacterial agents. 

20 Proteins classified in Table 1 as cell wall 

biosynthetic proteins, and external target proteins, were 
identified by homology with known proteins. These proteins 
are useful for identifying agents that bind and inhibit 
bacterial growth. Therefore, in another embodiment of the 

25 invention, the proteins of these classifications are 
prepared, preferably by recombinant means as described 
herein, substantially purified, and used in a screen to 
identify compounds that bind and/or inhibit the activity of 
said proteins. A variety of suitable screens are 

30 contemplated for this purpose. For example, the protein (s) 
can be labeled by known techniques such as radiolabeling or 
fluorescent tagging, or by labeling with biotin/avidin; 
thereafter binding of a test compound to a labeled protein 
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can be determined by any suitable means, well known to the 
skilled artisan. 

The proteins categorized as "minimal gene set" are 
homologous to a set of highly conserved proteins found in 
5 other bacteria. The minimal gene set proteins are thought to 
be essential for viability, and are useful targets for the 
development of new antibacterial compounds, 

DNA Chips and Applications 

10 The nucleic acids disclosed herein, or subfragments 

thereof, may be arrayed on any suitable solid surface, 
thereby constructing a "chip." DNA chip hybridizations 
provide greater sensitivity than do conventional 
hybridization means, such as Southern hybridization or 

15 Northern hybridization. DNA chips are useful for a variety 
of purposes, for example, in mutation and gene expression 
analysis, and in probing the structure, function, and 
expression of the genome. This aspect of the invention 
relates to any one or more of the DNA fragments disclosed 

20 herein, wherein said fragments are attached to a solid 

support (i.e. "chip" or "DNA chip" or "Bio chip"). Attachment 
of a nucleic acid to a support can be, but is not 
necessarily, accomplished by chemical or enzymatic means. 

In one embodiment, DNA fragments of this invention are 

2 5 arrayed onto a solid support as a means for assessing gene 
expression in S. pneumoniae. The DNA fragments attached to a 
chip may be of any size that is suitable for hybridization 
to other nucleic acid molecules such as cDNAs, genomic DNAs, 
or RNAs. Suitably-sized DNA fragments are from 10 nucleotide 

30 residues to approximately several thousand residues. The 
preferred length is about 50 to 500 nucleotides. 

Analysis of gene expression using the chips of this 
invention is assessed by hybridization of a chip to RNA 
samples, or cDNA samples prepared from 5. pneumoniae grown 
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under any suitable conditions. Preferred samples for 
hybridization to a chip comprise cDNA. Methods for preparing 
RNA or cDNA are well known in the art. 

A variety of suitable methods are known for fixing DNA 
5 fragments to solid support matrices [See e.g. D. Stimpson et 
al. "Real-time detection of DNA hybridization and melting on 
oligonucleotide arrays by using optical wave guides" Proc. 
Nat. Acad. Sci. 92, 6379 (1995)] Preferred surfaces for 
producing a chip are glass or polystyrene. Convenient 

10 surfaces are microscope slides, or cover slips (Corning), 
treated with silicon or silane to minimize non-specific 
binding by DNA or proteins. Also suitable for this purpose 
are 96-well microtiter plates. 

A light-directed method may be used for attaching 

15 oligonucleotides, enabling nucleotide synthesis directly on 
the solid surface using photolabile 5 'protected N-acyl- 
deoxynucleotide phosphoramidites and surface linker 
chemistry (See Pease et al. "Light-generated oligonucleotide 
arrays for rapid DNA sequence analysis" Proc. Nat. Acad. 

20 Sci. 91, 5022-5026, 1994). Alternatively, DNA fragments can 
be bound to a surface via interaction with a specific DNA 
binding protein. Any suitable DNA binding protein may be 
used, for example bacteriophage DNA binding proteins, 
Adenovirus binding protein, the E. coli lac-repressor 

25 protein, or 1-repressor protein. DNA binding proteins are 
attached to the surface of a chip by covalent chemical 
binding, essentially as described in U.S. Patent 5,561,071, 
the entire contents of which is incorporated by reference. 
The latter method requires that DNA fragments contain a 

30 recognition sequence that enables binding by the DNA binding 
protein. Specific sequences for a number of DNA binding 
proteins are known. Methods for incorporating specific 
binding sequences into the genomic DNA fragments disclosed 
herein are well known in the cloning arts. 
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DNA chip technology enables monitoring S. pneumoniae 
gene expression on a genome-wide level. This feature of the 
invention is particularly attractive for identifying (1) 
genes that are expressed or not expressed during the life 
5 cycle or infection cycle of S. pneumoniae, and (2) changes 
in gene expression that correlate with environmental change. 

For example, virulence genes in S. pneumoniae can be 
identified by the DNA chip method disclosed herein. 
Identification of virulence genes in 5. pneumoniae will 

10 provide new targets for developing novel antibiotics. For 
this aspect of the invention any suitable encapsulated 
strain of S. pneumoniae is introduced into a mouse, for 
example, by intraperitoneal injection, or by introduction 
directly into the lungs, or by any other suitable method. 

15 Approximatly 2 days after infection a peripheral blood titre 

level is reached of about 10 8 S. pneumoniae cells/ml. Cells 
recovered from peripheral blood, or other suitable tissue, 
are used in identifying virulence genes. For this purpose, 
cDNAs are prepared from cells recovered from an in vivo 
20 infection and from cells grown in vitro. After labeling, the 
cDNAs are hybridized against the DNA chip(s) disclosed 
herein. Genomic fragments that hybridize to the in vivo 
probe but not to the in vitro probe identify candidate 
virulence genes. 

25 Also contemplated by this aspect of the invention is a 

method for analyzing gene expression in S. pneumoniae cells 
grown or harvested from any desireable in vitro or in vivo 
environment, wherein said environment may include compounds 
whose effects on gene expression are to be determined. 

30 In another embodiment, the present invention relates to 

a DNA bio-chip, useful for correlating DNA sequence with 
biological function. The bio-chip comprises an array of the 
genomic DNA fragments disclosed herein, or portions thereof, 
attached to the surface of any suitable solid support 
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material. The bio chip further comprises a layer of 
competent S. pneumoniae cells suspended over the DNA array 
in any suitable semi-solid medium such as agar or agarose. 
The cells suspended on the bio chip comprise known or 
5 unknown mutant strains, or they may be wild-type cells. The 
cell layer is in contact with the DNA matrix such that DNA 
on the chip can be taken up by the cells. 

The bio-chip is useful for several purposes. For 
example, the bio-chip can be used to localize an unknown 

10 mutation to a specific region of the genome by 

complementation. The bio-chip enables correlating a 
phenotype with a genetic locus. For example, mutant cells 
harboring one or more mutations and having at least one 
screenable or selectable phenotype can be applied to a bio 

15 chip and subjected to an environment that allows for 

selection, or for screening by complementation. If said 
phenotype is the result of a chromosomal mutation or 
mutations that map to a genomic fragment present on the 
chip, DNA uptake by the cells and repair of the. mutation by 

20 recombination will be identifiable by a suitable screen or 
selection. 

In a preferred embodiment, the bio-chip is overlayed 
with competent S. pneumoniae cells. Methods for preparing 
competent cells are known (See e.g. LeBlanc et.aJ. Plasmid 
25 28, 130-145, 1992; Pozzi et al. J. Bacteriol . 178, 6087-6090, 
1996) . 

Other embodiments of this aspect of the invention are 
contemplated. For example the genomic fragments disclosed 
herein could be prepared and dispensed into individual wells 
30 of a 96-well micro titre plate. Competent S. pneumoniae 
cells could then be added to the wells under conditions 
suitable for DNA uptake followed by plating onto any 
suitable selection or screening medium, for example an agar 
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plate containing suitable growth and/or selection/screening 
components . 

Diagnostic Kits and Assays 
5 The present invention further relates to kits and 

assays that can be used for rapid and efficient detection of 
S. pneumoniae ceils. Also contemplated are kits for 
detecting mutations carried by S. pneumoniae cells. Kits of 
this nature are particularly attractive in the clinical 

10 environment where knowledge about the identity of a pathogen 
and/or of the basis for resistance to antibiotic treatments 
is essential for effective medical treatment. In the long 
term, knowledge of the mutations that lead to resistance 
will enable the design of new antibacterial agents. 

15 A kit for detecting S. pneumoniae cells can be based on 

antibody recognition of S. pneumoniae specific antigens or 
epitopes, or by nucleic acid hybridization techniques for 
the detection of S. pneumoniae specific nucleic acid 
molecules, 

2 0 A variety of embodiments are contemplated in this 

aspect of the invention. In one embodiment a kit is provided 
for detecting mutations in' drug-resistant S. pneumoniae. For 
this purpose, DNA is prepared from a resistant isolate and 
from a wild-type strain. In a preferred embodiment, the 

25 polymerase chain reaction {i.e. PCR) is used to amplify DNA 
samples representing any one or all of the genomic fragments 
disclosed herein. The amplified DNAs from the mutant and 
wild-type cells are hybridized to a DNA chip having fixed 
thereon any one or more of the genomic fragments disclosed 

30 herein. Amplified DNA samples from the mutant and wild-type 
strain are labeled by any suitable means, for example using 
radioisotopes or fluorescent labeling. Hybridization of the 
amplified DNAs to the chip under conditions that can 
discriminate single or multiple base pair mismatches enables 



WO 98/26072 



PCT/US97/22578 



-17- 
th e detection of differences between the mutant and wild- 
type samples. This method identifies a specific fragment of 
the genome that is altered in the mutant strain. The 
specific mutation can be determined by conventional DNA 
5 sequence analysis. 

This aspect of the invention also relates to the 
detection of S. pneumoniae proteins in a sample using 
antibody molecules raised against any suitable ORF disclosed 
herein. Antibody detection methods are well known to those 

10 skilled in the art including, for example, a variety of 
radioimmunological assays. (See e.g. P. Tijssen, Practice 
and Theory of Enzyme Immunoassays: Laboratory Techniques in 
Biochemistry and Molecular Biology , Elsevier Science 
Publishers, Amsterdam, The Netherlands, 1985) . 

15 Test samples suitable for use in this aspect of the 

invention include but are not limited to biological fluids 
such as sputum, blood, serum, plasma, urine, and to biopsy 
samples . 

Skilled artisans will recognize that the disclosed 
20 method and reagents can be readily incorporated into a kit. 
For example, a kit would contain one or more receptacles 
comprising one or more of the following: PCR reagents, DNA 
chip reagents, labeling reagents, assorted buffers, and/or 
antibodies . 

25 

Production of Antibodies 

The proteins of this invention and fragments 
thereof may be used in the production of antibodies. The 
term "antibody" as used herein describes antibodies, 
30 fragments of antibodies (such as, but not limited, to Fab, 
Fab 1 , Fab2 f , and Fv fragments), and chimeric, humanized, 
veneered, resurfaced, or CDR-grafted antibodies capable of 
binding antigens of a similar nature as the parent antibody 
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molecule from which they are derived. The instant invention 
also encompasses single chain polypeptide binding molecules. 

The production of antibodies, both monoclonal and 
polyclonal, in animals is well known in the art. See, e.g., 
5 C. Milstein, Handbook of Experimental Immunology , (Blackwell 
Scientific Pub., 1986); J. Goding, Monoclonal Antibodies: 
Principles and Practice , (Academic Press, 1983) . For the 
production of monoclonal antibodies the process begins with 
injecting a mouse, or other suitable animal, with an 

10 immunogen. The mouse is subsequently sacrificed and cells 
taken from its spleen are fused with myeloma cells, 
resulting in a hybridoma that can be cultured in vitro. 
Hybridomas are screened for clones that secrete a single 
antibody species, specific for the immunogen. 

15 Chimeric antibodies, described in U.S. Patent No. 

4,816,567, herein incorporated by reference, teaches methods 
and vectors for preparing chimeric antibodies . An 
alternative approach is provided in U.S. Patent No. 
4,816,397, the entire contents of which is herein 

20 incorporated by reference. This patent teaches co- 
expression of heavy and light chains in the same host cell. 

The method taught in U.S. Patent 4,816,397 has 
been further refined in European Patent Publication No. 0 
239 400. The teachings of this publication are preferred for 

25 engineering monoclonal antibodies. In this technology the 

^ complementarity determining regions (CDRs) of a human 

antibody are replaced with the CDRs of a murine monoclonal 
antibody, thereby converting the specificity of the human 
antibody to the specificity of the murine antibody. 

30 Single chain antibodies and libraries thereof 

provide yet another means for genetically engineering 
antibody molecules. (See, e.g. R.E. Bird, et al., Science 
242:423-426 (1988); PCT Publication Nos. WO 88/01649, WO 
90/14430, and WO 91/107-37. Single chain antibody technology 
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involves covalently joining the binding regions of heavy and 
light chains thereby generating a single polypeptide chain 
having the binding specificity of an intact antibody 
molecule - 

The antibodies contemplated by the present invention 
are useful in diagnostics, therapeutics, or in 
diagnostic/therapeutic combinations . 

The proteins of this invention, or suitable fragments 
thereof, can be used to generate polyclonal or monoclonal 
antibodies, and various inter-species hybrids, or humanized 
antibodies, or antibody fragments, or single-chain 
antibodies. The techniques for producing antibodies are well 
known to skilled artisans. (See e.g. A.M. Campbell, 
Monoclonal Antibody Technology: Laboratory Techniques in 
Biochemsitry and Molecular Biology , Elsevier Science 
Publishers, Amsterdam (1984); Kohler and Milstein, Nature 
256, 495-497 (1975); Monoclonal Antibodies: Principles & 
Applications Ed. J. R. Birch & E.S. Lennox, Wiley-Liss, 1995. 

A protein or peptide to be used as an immunogen may be 
administered in an adjuvant by subcutaneous or 
intraperitoneal injection into, for example, a mouse or a 
rabbit. For the production of monoclonal antibodies, spleen 
cells from immunized animals are removed, fused with myeloma 
cells, such as SP2/0-Agl4 cells, and allowed to become 
monoclonal antibody producing hybridoma cells in the manner 
known to the skilled artisan. Hybridomas that secrete the 
desired antibody molecule can be screened by a variety of 
well known methods, for example EL ISA assay, western blot 
analysis, or radioimmunoassay (Lutz et aJ. Exp. Cell Res. 
175, 109-124 (1988); Monoclonal Antibodies: Principles & 
Applications Ed. J. R. Birch & E.S. Lennox, Wiley-Liss, 1995). 

For some applications it is desireable to have an 
antibody labeled in some fashion. Procedures for labeling 
antibody molecules with radioisotopes, affinity labels, such 
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as biotin or avidin, enzymatic labels, for example 
horseradish peroxidase, and fluorescent labels such as FITC 
or rhodamine, are widely known (See e.g. Enzyme -Mediated 
Immunoassay , Ed. T. Ngo, H. Lenhoff, Plenum Press 1985; 
5 Principles of Immunology and Immunodiaqnostics , R.M. Aloisi, 
Lea & Febiger, 1988) . 

Labeled antibodies are useful for a variety of 
diagnostic applications. In one embodiment, the present 
invention relates to the use of labeled antibodies to detect 

10 the presence of S. pneumoniae cells and proteins. Also 
contemplated are applications that use antibodies, 
preferably single chain antibodies, directed against a S. 
pneumoniae protein. Proteins identified as "external 
targets" are preferred for the generation of single chain 

15 antibodies. Single chain antibody libraries directed against 
S. pneumoniae surface proteins and cell wall proteins can be 
produced by applying the phage display technique to crude 
membrane preparations. Antibodies that recognize and bind to 
external target proteins and/or cell wall proteins could be 

20 used as therapeutic agents to inhibit the growth of S. 

pneumoniae. Alternatively, the antibodies could be used in a 
screen to identify potential inhibitors of an external 
target protein. For example, in a competitive displacement 
assay, an antibody or compound to be tested is labeled by 

25 any suitable method. Competitive displacement of an antibody 
from an antibody-antigen complex by a test compound provides 
a means to identify new antibacterial compounds. 

Protein Production Methods 
30 The present invention relates further to 

substantially purified proteins encoded by the ORFs 
disclosed herein (SEQ ID NO: 87 through SEQ ID NO: 228) . 

Skilled artisans will recognize that proteins can 
be synthesized by different methods, for example, chemical 
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methods or recombinant methods/ as described in U.S. Patent 
4,617,149, hereby incorporated by reference. 

The principles of solid phase chemical synthesis 
of polypeptides are well known in the art and may be found 
5 in general texts relating to this area. See, e.g., H. Dugas 
and C. Penney, Bioorganic Chemistry (1981) Springer-Verlag, 
New York, 54-92. Peptides may be synthesized by solid-phase 
methodology utilizing an Applied Biosystems 430A peptide 
synthesizer (Applied Biosystems, Foster City, CA) and 

10 synthesis cycles supplied by Applied Biosystems. Protected 
amino acids, such as t-butoxycarbonyl-protected amino acids, 
and other reagents are commercially available from many 
chemical supply houses. 

The proteins and peptides of the present invention 

15 can also be made by recombinant DNA methods. Recombinant 
methods are preferred if a high yield is desired. 
Recombinant methods involve expressing a cloned ORF/gene in 
a suitable host cell. A gene is introduced into a host cell 
by any suitable means, well known to those skilled in the 

2 0 art. While chromosomal integration of a cloned gene is 

within the scope of the present invention, it is preferred 
that a cloned gene be maintained extra-chromosoraally, as 
part of a vector wherein the gene is in operable-linkage to 
a constitutive or inducible promoter. 

25 Recombinant methods are also useful in 

overproducing a membrane-bound or membrane-associated 
protein. In some cases, membranes prepared from recombinant 
cells that overexpress such proteins provide an enriched 
source of the protein. Such membranes are useful for 

30 evaluating the function of the protein and/or for evaluating 
inhibitors of the protein. 
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Expressinq Recombinant Proteins in Procar votic and 

Eucaryotic Host Cells 

Procaryotes are generally used for cloning DNA 
sequences and for constructing vectors. For example, the 
Escherichia coli K12 strain 294 (ATCC No. 31446) is 
particularly useful for expression of foreign proteins. 
Other strains of E. coli, bacilli such as Bacillus subtilis, 
enterobacteriaceae such as Salmonella typhimurium or 
Serratia marcescans, various Pseudomonas species may also be 
employed as host cells in cloning and expressing the 
recombinant proteins of this invention. Also contemplated 
are various strains of Streptococcus and Streptocmyces . 

For effective expression of a recombinant protein 
a gene or ORF may be linked to a known promoter sequence. 
Suitable bacterial promoters include b -lactamase [e.g. 
vector pGX2907, ATCC 39344, contains a replicon and b - 
lactamase gene], lactose systems [Chang et al. r Nature 
(London), 275:615 (1978); Goeddel et al., Nature (London), 
281:544 (1979)], alkaline phosphatase, and the tryptophan 
(trp) promoter system [vector pATHl (ATCC 37695)] designed 
for the expression of a trpE fusion protein. Hybrid 
promoters such as the tac promoter (isolatable from plasmid 
pDR540, ATCC-37282) are also suitable. Promoters for use in 
bacterial systems also will contain a Shine-Dalgarno 
sequence operably linked to the DNA encoding the desired 
polypeptides. These examples are illustrative rather than 
limiting. 

A variety of mammalian cell systems and yeasts are 
also suitable host cells. The yeast Saccharomyces 
cerevisiae is a commonly used eucaryotic microorganism. 
Other yeasts such as Kluyveromyces lactis are also suitable. 
For expression of recombinant genes in Saccharomyces, the 
plasmid YRp7 (ATCC-40053) , for example, may be used. See, 
e.g., L. Stinchcomb, et al., Nature, 282:39 (1979); J. 
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Kingsman et al., Gene, 7:141 (1979); S. Tschemper et al. , 
Gene, 10:157 (1980). Plasmid YRp7 contains the TRP1 gene 
that provides a selectable marker in a trpl mutant. 



5 Purification of Recombinantly-Produced Protein 

An expression vector carrying an ORF of the 
present invention is transformed or transfected into a 
suitable host cell using standard methods. Cells which 
contain the vector are propagated under conditions suitable 

10 for expression of the encoded protein. If the gene is under 
the control of an inducible promoter then suitable growth 
conditions would incorporate the appropriate inducer. The 
recombinantly-produced protein may be purified from cellular 
extracts of transformed cells by any suitable means. 

15 In a preferred process for protein purification a 

gene/ORF is modified at the 5 1 end, or some other position, 
to incorporate a plurality of histidine residues at the 
amino terminus of the encoded protein. The "histidine tag 11 
produced thereby enables a single-step protein purification 

20 method referred to as "immobilized metal ion affinity 

chromatography" (IMAC), essentially as described in U.S. 
Patent 4,569,794, hereby incorporated by reference. The IMAC 
method enables rapid isolation of substantially pure protein 
starting from a crude cellular extract. 

25 As skilled artisans will recognize, the proteins 

of the invention can be encoded by a multitude of different 
nucleic acid sequences owing to the degeneracy of the 
genetic code. The present invention further comprises these 
alternate nucleic acid sequences. 

30 The ribonucleic acid compounds of the present 

invention may be prepared using the polynucleotide synthetic 
methods discussed supra, or they may be prepared 
enzymatically using RNA polymerase to transcribe a DNA 
template. 
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The most preferred systems for preparing the 
ribonucleic acids of the present invention employ the RNA 
polymerase from the bacteriophage T7 or the bacteriophage 
SP6. These RNA polymerases are highly specific, requiring 
5 the insertion of bacteriophage-specif ic sequences at the 5' 
end of the template to be transcribed. See, J. Sambrook, et 
al., supra, at 18.82-18.84. 

This invention also provides nucleic acids, RNA or 
DNA, which are complementary to the sequences disclosed 
10 herein. 

The present invention also provides probes and 
primers useful for a variety of molecular biology techniques 
including, for example, hybridization screens of genomic or 
subgenomic libraries, detection and quantification of mRNA 

15 species as a means to analyzing gene expression, and 

amplification of any region of the Streptococcus pneumoniae 
genome disclosed by the sequences herein. A nucleic acid 
compound is provided comprising any of the sequences 
disclosed herein, or a complementary sequence thereof, or a 

20 fragment thereof, which is at least 15 base pairs in length, 
and which will hybridize selectively to Streptococcus 
pneumoniae DNA or mRNA. Preferably, the 15 or more base pair 
compound is DNA. A probe or primer length of at least 15 
base pairs is dictated by theoretical and practical 

25 considerations. See e.g. B. Wallace and G. Miyada, 

"Oligonucleotide Probes for the Screening of Recombinant DNA 
Libraries," In Methods in Enzymology , Vol. 152, 432-442, 
Academic Press (1987) . 

The probes and primers of this invention can be 

3 0 prepared by methods well known to those skilled in the art 
(See e.g. Sambrook et al. supra). In a most preferred 
embodiment these probes and primers are synthesized by the 
polymerase chain reaction (PCR) . 
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The present invention also relates to recombinant 
DNA cloning vectors and expression vectors comprising the 
nucleic acids of the present invention. Preferred nucleic 
acid vectors are those which comprise DNA. The skilled 
5 artisan understands that choosing the most appropriate 

cloning vector or expression vector depends on a number of 
factors including the availability of restriction enzyme 
sites, the type of host cell into which the vector is to be 
transfected or transformed, the purpose of the transfection 

10 or transformation (e.g., stable transformation as an 
extrachromosomal element, or integration into the host 
chromosome), the presence or absence of readily assayable or 
selectable markers (e.g., antibiotic resistance and 
metabolic markers of one type and another) , and the number 

15 of gene copies desired in the host cell. 

Vectors suitable to carry the nucleic acids of the 
present invention comprise RNA viruses, DNA viruses, lytic 
bacteriophages, lysogenic bacteriophages, stable 
bacteriophages, plasmids, viroids, and the like. The most 

20 preferred vectors are plasmids. 

Host cells harboring the nucleic acids disclosed 
herein are also provided by the present invention. A 
preferred host is E. coJi which has been transfected or 
transformed with a vector that comprises a nucleic acid of 

25 the present invention. 

The present invention also provides a method for 
constructing a recombinant host cell capable of expressing 
an ORF disclosed herein, said method comprising transforming 
or otherwise introducing into a host cell a recombinant DNA 

30 vector that comprises an isolated DNA sequence which encodes 
said ORF. The preferred host cell is any strain of E. coli 
which can accomodate high level expression of an exogenously 
introduced gene. Transformed host cells are cultured under 
conditions well known to skilled artisans such that said ORF 
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is expressed, thereby producing the encoded protein in the 
recombinant host cell. 

For the purpose of discovering new inhibitors of 
cell wall biosynthesis, it would be desirable to determine 
5 agents that inhibit enzymes" required for synthesis of the 
cell wall and/or agents that interact with membrane 
proteins. A method for identifying compounds that interact 
with such enzymes and membrane proteins comprises contacting 
said proteins with a test compound and monitoring an 
10 interaction and/or inhibition by any suitable means. 

The instant invention provides a screening system 
for compounds that interact with membrane proteins of this 
invention, said screening system comprising the steps of: 

a) preparing a membrane protein, or membranes 
15 enriched in said protein; 

b) exposing the protein source of (a) to a test 
compound; and 

c) quantifying the interaction of said protein with 
said compound by any suitable means. 

20 

The screening method of this invention may be 
adapted to automated procedures such as a PANDEX® (Baxter- 
Dade Diagnostics) system, allowing for efficient high-volume 
screening of compounds. 

25 In a typical screening protocol, a protein to be 

tested is prepared as described herein, preferably using 
recombinant DNA technology. A test compound is introduced 
into a reaction vessel containing said protein. The 
reaction/ interaction of said protein and said compound is 

30 monitored by any suitable means. For example, a 

radioactively-labeled or chemically-labeled compound or 
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protein may be used. Specific association between a test 
compound and protein is monitored by any suitable means. 

The following examples more fully describe the 
present invention. Those skilled in the art will recognize 
5 that the particular reagents , equipment, and procedures 
described are merely illustrative and are not intended to 
limit the present invention in any manner. 

EXAMPLE 1 

10 Vector for Expressing 5. pneumoniae ORF in a Host Cell 

An expression vector suitable for expressing a S. 
pneumoniae gene or fragment thereof in a variety of 
procaryotic host cells, such as E. coli, is easily made. A 
suitable parent vector contains an origin of replication 

15 (Ori) , a marker for selecting transf ormants, for example, an 
ampicillin resistance gene (Amp) , and further comprises 
suitable transcriptional and translational signals, for 
example, the T7 promoter and T7 terminator sequences, in 
operable-linkage to a S. pneumoniae coding region. For 

20 example, pETHA (obtained from Novogen, Madison WI) is 

linearized by restriction with endonucleases Ndel and BamHI . 
Linearized pETHA is ligated to a DNA fragment bearing Ndel 
and BamHI sticky ends and comprising a coding region for a 
S. pneumoniae ORF. 

25 The ORF used in this construction may be modified 

at the 5 1 end (amino terminus of encoded protein or peptide) 
to simplify purification of the encoded protein or peptide. 
For this purpose, an oligonucleotide encoding 8 histidine 
residues is inserted after the transcriptional and 

30 translational start sites. Placement of the histidine 
residues at the amino terminus of the encoded protein 
enables the IMAC one-step protein purification procedure. 



Example2 
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Recombinant Expression and Purification of a Protein Encoded 

by a S. pneumoniae ORF 
An expression vector that carries an ORF from the 
S. pneumoniae genome, as disclosed in Example 1, and which 
5 ORF is operably-linked to an expression promoter, is 
transformed into E. coli BL21 (DE3) (hsdS gal lclts857 
indlSam7nin51acUV5-T7gene 1) using standard methods. 
Transf ormants, selected for resistance to ampicillin, are 
chosen at random and tested for the presence of the vector 
10 by agarose gel electrophoresis using quick plasmid 

preparations. Colonies that contain the vector are grown in 
L broth and the protein produced by the vector-borne ORF is 
purified by IMAC, essentially as described in US Patent 
4, 569,794 . 

15 Briefly, the IMAC column is prepared as follows. A 

metal-free chelating resin (e.g. Sepharose 6B IDA, 
Pharmacia) is washed in distilled water to remove 
preservatives and then infused with a suitable metal ion 
[e.g. Nidi), Co(II), or Cu(II)] by adding a 50mM metal 

20 chloride or metal sulfate aqueous solution until about 75% 
of the interstitial spaces of the resin are saturated with 
colored metal ion. The column is then ready to receive a 
crude cellular extract containing the recombinant protein 
product . 

25 Unbound proteins and other materials are removed by 

washing the column with any suitable buffer, pH 7.5. Bound 
protein is eluted in any suitable buffer at pH 4.3, or 
preferably with an iraidizole-containing buffer at pH 7.5. 



30 Example 3 

DNA Chip Production 
Any one or more of the S. pneumoniae genome DNA 
fragments disclosed herein, or fragments thereof, are arrayed 
onto a solid support. It is preferred that fragments be in 
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the size range of 14 base pairs to 500 base pairs. The DNA 
samples are most conveniently synthesized by PCR using 
standard methods to amplify regions disclosed by the genomic 
sequences herein. The method of Schena et ai. is used to 
5 spot about 1 ng to 10 ng of a DNA sample onto glass 

microscope slides that have been treated with poly-L-lysine 
(M. Schena et al. "Quantitative monitoring of gene 
expression patterns with a complementary DNA microarray" 
Science, 270, 467-470, 1995) . After spotting DNA samples 
10 onto the chip and air-drying, the chips are rehydrated by 
incubation for about 2 hours in a humid chamber. Chips are 

then placed at 100° C for 1 minute, rinsed in 0.1% SDS, and 
treated with 0.05% succinic anhydride in 50% l-methyl-2- 
pyrrolidinone and 50% boric acid. 

15 

Example 4 

5. pneumoniae Gene Expression Analysis using DNA Chips 
RNA prepared from cells grown under any desireable 
conditions is used to prime cDNA synthesis by reverse 

20 transcription, using methods well known to the skilled 

artisan (See e.g. Molecular Cloning , 2d Ed. J.Sambrook, E. 
Fritsch, T. Maniatis, 1989). For example, total RNA of 
strain R6 is prepared according to the method of Logeman 
et.al., (Analytical Biochemistry, 1987, 163, 16-20) using 

25 guanidine hydrochloride. After ethanol precipitation, the 
total RNA is dissolved in a buffered solution such as Tris- 
EDTA (TE) . Complementary DNA's are synthesized with the aid 
of the StrataScript RT-PCR kit (Stratagene, Inc. ) in 
accordance with the supplier's recommendations (See Schena 

30 et al. Id.). Briefly, a 50 ul reaction contains about 0.1 
ug/ul of RNA. First strand synthesis is primed using random 
primers, IX first strand buffer, 0.03 U/ul ribonuclease 
block, 500 uM dATP, 500 uM dTTP, 40 uM dGTP, 40 uM 
f luorescein-12-dCTP (New England Nuclear), and 0.03 U/ul 
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reverse transcriptase. Reactions are incubated for 60 
minutes at 37° C, precipitated with ethanol, and resuspended 

in 10 ul TE pH 8. Samples are heated for 3 minutes at 94° C 
and chilled on ice. The RNA is degraded by adding 0.2 5 ul of 

10 N NaOH, followed by a 10 minute incubation at 37° C. The 
samples are neutralized with 2.5 ul of 1M Tris-HCl, pH 8 and 
0.25 ul of 10 N HC1. After ethanol precipitation, the 
nucleic acid pellet is washed and dried in vaccuo. 

Prior to hyrbrization, DNA chips prepared as in Example 

3 are denatured by heating to 90°C for 2 minutes . 
Hybridization reactions contain about 1 ul of f luorescently- 
labeled cDNA, and 1 ul of hybridization buffer (lOx SSC and 
0.2% SDS) . Probe mixtures are transferred to the surface of 
the chip, covered with a cover slip, and incubated for 18 

hours at 65° C. Chips are washed 5 minutes at room 
temperature in IX SSC, 0.1% SDS, then for 10 minutes at room 
temperature in 0.1X SSC, 0.1% SDS. After hybridization, 
chips are scanned with a laser-scanning device. 
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Example 5 

A DNA Bio Chip for mutation analysis 
Duplicate DNA chips are prepared as in Example 3. Each 
chip is overlayed with S. pneumoniae cells in a semi-solid 
5 medium, wherein said cells carry a temperature-sensitive 
(ts) mutation in a gene required for autolytic activity 

(Lyt~) . This mutation leads to resistance to lysis at 37° C, 
but sensitivity to lytic treatments at 30° C. 

S. pneumoniae strain cwl is resistant to lysis by 

10 detergent and penicillin when grown at 37° C, but remains 
sensitive when grown at 30° C (cwl is derived from strain 
R6; See P. Garcia et ai. "Mutants of Streptococcus 
pneumoniae that contain a temperature-senstive autolysin" J. 
Gen. Microbiol. 132, 1401-05, 1986). Strain cwl is grown at 

15 30° C and competent cells are prepared according to any 
suitable method (e.g. LeBlanc et.al. Plasmid 28, 130-145, 
1992; Pozzi et al. J. Bacteriol .178, 6087-6090, 1996). 
Competent cwl cells are harvested by centrifugation and 
resuspended at about 10 5 cells per ml in 1% melted agar 

20 supplemented with 0.1% (w/v) yeast extract (Difco) and 

containing 1% to 2% Triton X-100. Approximately 100 ul to 
500 ul of the cell mixture is deposited per square 
centimeter onto the bio chip by pipetting onto the chip 
surface. After solidification of the agar layer, one of the 

25 bio-chips is incubated at 37° C and the other at 30° C. 

Cells that take up a complementing genomic DNA fragment from 
the chip surface will be lysed at both 30° C and 37° C, 

while non-complemented cells are lysed only at 30° C. Cells 
that are complemented by the bio-chip are recognizable by 
30 this phenotypic difference and can be further purified by 
well known methods. 
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CLAIMS 

1- An isolated nucleic acid compound comprising a 
sequence identical to or substantially identical to a 
5 sequence selected from the group consisting of SEQ ID NO:l 
through SEQ ID NO: 86. 

2. An isolated nucleic acid compound comprising a 
sequence identical to or substantially identical to a 
sequence selected from the group consisting of SEQ ID NO: 87, 
SEQ ID NO: 89, SEQ ID NO: 91, SEQ ID NO: 93, SEQ ID NO: 95, SEQ 
ID NO: 97, SEQ ID NO: 99, SEQ ID NO: 101, SEQ ID NO: 103, SEQ ID 
NO: 105, SEQ ID NO: 107, SEQ ID NO: 109, SEQ ID NO: 111, SEQ ID 
NO: 113, SEQ ID NO: 115, SEQ ID NO: 117, SEQ ID NO: 119, and SEQ 
ID NO:121. 

3. A substantially purified protein or fragment 
thereof from S. pneumoniae wherein said protein is selected 
from the group consisting of SEQ ID NO: 88, SEQ ID NO: 90, SEQ 
ID NO: 92, SEQ ID NO: 94, SEQ ID NO: 96, SEQ ID NO: 98, SEQ ID 
NO: 100, SEQ ID NO: 102, SEQ ID NO: 104, SEQ ID NO: 106, SEQ ID 
NO: 108, SEQ ID NO: 110, SEQ ID NO: 112, SEQ ID NO: 114, SEQ ID 
NO: 116, SEQ ID NO: 118, SEQ ID NO: 120, SEQ ID NO: 122, and SEQ 
ID NO: 123 through SEQ ID NO: 228. 

4. An isolated nucleic acid compound encoding any 
one of the proteins or fragments thereof of Claim 3. 

5. A vector comprising any one of the nucleic acid 
30 compounds of claims 1, 2, or 4 . 

6. A recombinant host containing any one of the 
vectors of claim 5. 
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7, A substantially purified protein from 
Streptococcus pneumoniae as in Claim 3 wherein said protein 
is an external target protein selected from Table 1. 

5 8. A substantially purified protein from 

Streptococcus pneumoniae as in Claim 3 wherein said protein 
is a hypothetical protein selected from Table 1. 

9. A substantially purified protein from 

10 Streptococcus pneumoniae as in Claim 3 wherein said protein 
is a cell wall synthetic protein selected from Table 1, 

10. A substantially purified protein from 
Streptococcus pneumoniae as in Claim 3 wherein said protein 

15 is a minimal gene set protein selected from Table 1. 

11. A DNA chip having arrayed thereon any at least 
15 base pair fragment of any one or more of the nucleic 
acids of claim 1. 

20 

12. A DNA chip having arrayed thereon any at least 
15 base pair fragment of any one or more of the nucleic 
acids of claim 2. 

25 13. A method for evaluating gene expression in 

Streptococcus pneumoniae comprising the step of incubating a 
DNA chip of claim 11 or Claim 12 with cDNA prepared from 
Streptococcus pneumoniae under conditions suitable for 
hybridization of complementary nucleic acid sequences. 



30 



14. A computer readable medium having recorded 
thereon any one or more of the nucleotide sequences of 
Claims 1 or Claim 2. 
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15. A method for identifying virulence genes in S. 
pneumoniae, comprising the steps of: 

a) preparing a DNA chip as in claim 11, 

b) preparing labeled cDNAs from 

5 i) S. pneumoniae cells recovered from an in 

vivo environment, and 

ii) S. pneumoniae cells grown in vitro, 

c) hybridizing individually the cDNAs of steps 
(b) (i) and (b) (ii) to a chip of step (a); and 

10 d) identifying a genomic DNA fragment or fragments 

on said chip that hybridize to the cDNAs of (b) (i) but not 
with the cDNAs of (b) (ii) . 

16. An antibody that selectively binds to a 
15 protein or peptide of Claim 3. 

17. An antibody that selectively binds to an 
external target protein, or fragment thereof, identified in 
Table 1. 

20 

18. A DNA chip of Claim 11 or Claim 12 further 
comprising a layer of S. pneumoniae cells wherein said layer 
contacts with said nucleic acids. 
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